Skip to content

Conversation

@markurtz
Copy link
Collaborator

Summary

Screenshot 2025-10-31 at 1 51 32 PM

Details

  • [ ]

Test Plan

Related Issues

  • Resolves #

  • "I certify that all code in this PR is my own, except as noted below."

Use of AI

  • Includes AI-assisted code completion
  • Includes code generated by an AI application
  • Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

@markurtz markurtz requested review from Copilot and sjmonson October 31, 2025 17:52
@markurtz markurtz self-assigned this Oct 31, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR performs a significant refactoring to reorganize the codebase's statistics, schemas, and benchmarking components. The main changes consolidate utilities into schemas, remove deprecated presentation modules, and restructure the test suite to align with the new organization.

Key changes:

  • Moved statistics utilities from utils/statistics.py to schemas/statistics.py and updated to use probability density functions (PDFs) instead of cumulative distribution functions (CDFs)
  • Relocated pydantic utilities from utils/pydantic_utils.py to schemas/base.py
  • Removed deprecated presentation modules (injector.py, data_models.py, builder.py)
  • Complete rewrite of statistics tests to use parametrized fixtures and broader distribution coverage
  • Added new console utilities for formatted table printing

Reviewed Changes

Copilot reviewed 59 out of 61 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
tests/unit/utils/test_statistics.py Complete rewrite with parametrized fixtures testing multiple probability distributions
tests/unit/utils/test_pydantic_utils.py Updated import path from utils to schemas
tests/unit/presentation/* Removed deprecated presentation test files
tests/unit/mock_benchmark.py Updated class name from BenchmarkSchedulerStats to BenchmarkSchedulerMetrics
src/guidellm/utils/statistics.py Deleted - moved to schemas/statistics.py
src/guidellm/schemas/statistics.py New file with refactored statistics using PDF-based approach
src/guidellm/schemas/base.py New file consolidating pydantic utilities
src/guidellm/utils/console.py Enhanced with table printing capabilities and improved documentation
src/guidellm/utils/functions.py Added safe_format_number utility
Multiple schema files Updated imports and restructured benchmark schemas
Comments suppressed due to low confidence (3)

src/guidellm/data/preprocessors/formatters.py:44

class GenerativeTextCompletionsRequestFormatter(RequestFormatter):

src/guidellm/data/preprocessors/formatters.py:118

class GenerativeChatCompletionsRequestFormatter(RequestFormatter):

src/guidellm/data/preprocessors/formatters.py:307

class GenerativeAudioTranscriptionRequestFormatter(RequestFormatter):

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@sjmonson
Copy link
Collaborator

sjmonson commented Oct 31, 2025

Ran this benchmark:

guidellm benchmark
  --target http://vllm-standalone-granite-3-2b.llmd.svc.cluster.local \
  --data "prompt_tokens=4096,prompt_tokens_stdev=512,prompt_tokens_min=2048,prompt_tokens_max=6144,output_tokens=512,output_tokens_stdev=128,output_tokens_min=1,output_tokens_max=1024" \
  --max-seconds 10 \
  --profile concurrent \
  --rate 10

And got the following error:

...
  File "/root/guidellm/src/guidellm/benchmark/benchmarker.py", line 161, in run
    benchmark = benchmark_class.compile(
        accumulator=accumulator,
        scheduler_state=scheduler_state,
    )
  File "/root/guidellm/src/guidellm/benchmark/schemas/generative/benchmark.py", line 134, in compile
    metrics=GenerativeMetrics.compile(accumulator),
            ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/root/guidellm/src/guidellm/benchmark/schemas/generative/metrics.py", line 797, in compile
    incomplete = accumulator.incomplete.get_within_range(start_time, end_time)
  File "/root/guidellm/src/guidellm/benchmark/schemas/generative/accumulator.py", line 623, in get_within_range
    if (stats.request_end_time >= start_time)
        ^^^^^^^^^^^^^^^^^^^^^^
  File "/root/guidellm/src/guidellm/schemas/request_stats.py", line 81, in request_end_time
    raise ValueError("resolve_end timings should be set but is None.")
ValueError: resolve_end timings should be set but is None.

Edit: Seems to be due to max-seconds constraint.

Copy link
Collaborator

@sjmonson sjmonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the max-seconds bug here are some high-level notes:

  • Needs signoff
  • Needs cleanup. To retroactively run pre-commit on only the changes: pre-commit run --from $(git merge-base main@{u} HEAD) --to HEAD
  • Only glanced over the accumulator and statistics code. Will do a more in-depth review time permitting but don't block on it.
  • Is warmup/cooldown only seconds now and not sometimes a percent? Setting --warmup .1 results in a table entry of Warm Sec: 0.1.

@markurtz markurtz force-pushed the features/refactor/progress_refactor branch from bfc4b2f to 4a951c4 Compare November 4, 2025 15:19
@markurtz markurtz force-pushed the features/refactor/progress_refactor branch from 73e7539 to 6c7f133 Compare November 4, 2025 17:26
Copy link
Collaborator

@jaredoconnell jaredoconnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm that the max-seconds error is fixed. I'll do more testing tomorrow, but it works for me.
The new CLI table outputs are a little busy to look at, but work. Adding more horizontal padding may help, but isn't necessary.

Comment on lines +209 to +225
self._add_field(
headers,
values,
"Run Info",
"Requests",
json.dumps(benchmark.config.requests),
)
self._add_field(
headers, values, "Run Info", "Backend", json.dumps(benchmark.config.backend)
)
self._add_field(
headers,
values,
"Run Info",
"Environment",
json.dumps(benchmark.config.environment),
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The requests and environment values are not the most pretty, but it works and doesn't exclude any information.
The format doesn't preserve the format of the original input, like it does on main right now, but if that's okay then I'm okay with proceeding.

worker = WorkerProcess[RequestT, ResponseT](
worker_index=rank,
messaging=self.messaging.create_worker_copy(
messaging=self.messaging.create_worker_copy( # type: ignore[arg-type]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may make sense to explain this ignored type error. If someone tries to fix it, this one is a confusing one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants